-
Notifications
You must be signed in to change notification settings - Fork 64
Update Video Encoder and tests for 6 container formats #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f414d0b
to
c29dee3
Compare
avCodecContext_->global_quality = FF_QP2LAMBDA * qp; | ||
} | ||
int status = avcodec_open2(avCodecContext_.get(), avCodec, &options); | ||
av_dict_free(&options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above, the crf
parameter is reused to set qscale
to encode high quality videos in round-trip tests. But, the C++ function only allows crf
to be set, not qscale
. Since qscale
is not needed anywhere else, I did not think it was worth including, but I am open to feedback here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context.
My understanding is that for those codecs that do not support crf, we set instead the qscale (quantizer scale) parameter. They both control encoding quality, but in different ways.
I think... we should avoid doing that. I don't have a good enough understanding of how these 2 parameters (and their values!) relate to each other, and I think we can punt on that for a first release of the encoder. Especially since we only really need this workaround for our round-trip test to run. It means we won't be able to do the run-trip tests on those formats, but that's OK:
- those formats aren't that popular anyway
- we should still be able to do the test against the FFmpeg CLI.
test/test_ops.py
Outdated
assert ( | ||
source_frames.shape == round_trip_frames.shape | ||
), f"Shape mismatch: source {source_frames.shape} vs round_trip {round_trip_frames.shape}" | ||
assert ( | ||
source_frames.dtype == round_trip_frames.dtype | ||
), f"Dtype mismatch: source {source_frames.dtype} vs round_trip {round_trip_frames.dtype}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use plain assert like
assert source_frames.shape == round_trip_frames.shape
pytest will provide a proper error message
test/test_ops.py
Outdated
atol = 2 | ||
# Check that PSNR for decode(encode(samples)) is above 30 | ||
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | ||
res = psnr(s_frame, rt_frame) | ||
assert res > 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it's not from this PR but let's clean that up a little:
- the comment isn't needed as the code is really self explanatory (and it doesn't just do psnr validation anymore!)
- no need to store
res
atol = 2 | |
# Check that PSNR for decode(encode(samples)) is above 30 | |
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | |
res = psnr(s_frame, rt_frame) | |
assert res > 30 | |
atol = 2 | |
for s_frame, rt_frame in zip(source_frames, round_trip_frames): | |
assert psnr(s_frame, rt_frame) > 30 |
assert_close(s_frame, rt_frame, atol=atol, rtol=0) | ||
|
||
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") | ||
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's duplicated
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available") |
test/test_ops.py
Outdated
"Codec for webm is not available in the FFmpeg6/7 installation on Windows." | ||
) | ||
asset = TEST_SRC_2_720P | ||
# Test that decode(encode(decode(asset))) == decode(asset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be at the top
test/test_ops.py
Outdated
f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes()) | ||
|
||
ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}") | ||
# Test that lossless encoding is identical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this comment is needed here, it looks out of place
|
||
const AVPixelFormat* getSupportedPixelFormats(const AVCodec& avCodec) { | ||
const AVPixelFormat* supportedPixelFormats = nullptr; | ||
#if LIBAVCODEC_VERSION_INT >= AV_VERSION_INT(61, 13, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment to specify which major FFmpeg version this correspond to
src/torchcodec/_core/Encoder.cpp
Outdated
// av_packet_rescale_ts ensures encoded frames have correct timestamps. | ||
// This prevents "no more frames" errors when decoding encoded frames, | ||
// https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (packet->duration == 0 && codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
// 1 means that 1 frame (in codec time base, which is the frame rate)
// This has to be set before av_packet_rescale_ts bellow.
packet->duration = 1;
}
which seems to be about the lines just above.
Is this comment at the right place? Maybe it should be a few lines above - and it should also explain why we need to set duration to 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the comment to be above the packet->duration
and av_packet_rescale_ts
lines and updated the text to reference it as "the code below". Let me know if I misunderstood your suggestion
UniqueEncodingAVFormatContext avFormatContext_; | ||
UniqueAVCodecContext avCodecContext_; | ||
int streamIndex_ = -1; | ||
AVStream* avStream_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we now store avStream_ mostly because we need to access time_base
? If that's the case, then let's get rid of the streamIndex_
field because it can now be accessed through avStream_
fe8fb87
to
4516b35
Compare
@Dan-Flores has imported this pull request. If you are a Meta employee, you can view this in D84393092. |
This PR updates the VideoEncoder to support encoding for common video container formats, nearly identically to the FFmpeg CLI.
Changes:
Some changes are made to align with the design in #907:
crf
as an option on the C++ side to enable round trip testsTesting
test_video_encoder_round_trip
: Ensures that a video's decoded frames are the same after encoding then decoding.mov
,mp4
,mkv
,webm
test_video_encoder_against_ffmpeg_cli
: Ensures that the VideoEncoder frames are the same as the FFmpeg CLI.mov
,mp4
,avi
,mkv
,webm
,flv
,gif
Testing caveats
crf
parameter is needed to test lossless encoding in the round trip test. For formats that do not supportcrf
, the round trip test is not availablle.assert_tensor_close_on_at_least
with a lower percentage match (96-99), and a higheratol
(2-15).